A Framework for Multilevel linguistic Annotations
نویسندگان
چکیده
This article presents a 3-step model for multilayer annotations of corpora. Each kind of annotation for a textual corporacorresponds to a di erent view on the same document. This principle can be expressed rst with a general relational model dedicated to the organisation of LR. This abstract model is then implemented as an application of the XML formalism for the encoding of large corpora. The exploitation of this kind of annotated corpora requires e cient manipulation processes and reversive access. We propose to use a third step representation based on a set of optimised FSA resulting from the parsing of the XML documents. These propositions have been implemented in the rst version of a workbench dedicated to the French Le Monde corpus.
منابع مشابه
ANNIS: Complex Multilevel Annotations in a Linguistic Database
We present ANNIS, a linguistic database that aims at facilitating the process of exploiting richly annotated language data by naive users. We describe the role of the database in our research project and the project requirements, with a special focus on aspects of multilevel annotation. We then illustrate the usability of the database by illustrative examples. We also address current challenges...
متن کاملSemi-Automatic Phonological Annotations of Speech by Grammatical Inference
This paper describes a technique for automatically generating multiple levels of linguistic annotation for a corpus of speech utterances. Using a training corpus of multilevel annotations, a corresponding finite-state representation is automatically constructed by grammatical inference. This finite-state description is then employed as a knowledge component to automatically generate a new multi...
متن کاملRepresenting and Accessing Multilevel Linguistic Annotation using the MEANING Format
We present an XML annotation format (MEANING Annotation Format, MAF) specifically designed to represent and integrate different levels of linguistic annotations and a tool that provides flexible access to them (MEANING Browser). We describe our experience in integrating linguistic annotations coming from different sources, and the solutions we adopted to implement efficient access to corpora an...
متن کاملTowards a formal framework for linguistic annotations
‘Linguistic annotation’ is a term covering any transcription, translation or annotation of textual data or recorded linguistic signals. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have foc...
متن کاملA framework for representing and managing linguistic annotations based on typed feature structures
In this paper we present a framework for dealing with linguistic annotations. Our aim is to establish a flexible and extensible infrastructure which follows a coherent and general representation scheme. This proposal provides us with a well-formalized basis for the exchange of linguistic information. We use TEI-P4 conformant feature structures as a representation schema for linguistic analyses....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011